PADAPT 1.0 – the Pannonian Dataset of Plant Traits

The existing plant trait databases’ applicability is limited for studies dealing with the flora and vegetation of the eastern and central part of Europe and for large-scale comparisons across regions, mostly because their geographical data coverage is limited and they incorporate records from several different sources, often from regions with markedly different climatic conditions. These problems motivated the compilation of a regional dataset for the flora of the Pannonian region (Eastern Central Europe). PADAPT, the Pannonian Dataset of Plant Traits relies on regional data sources and collates data on 54 traits and attributes of the plant species of the Pannonian region. The current version covers approximately 90% of the species of the region and consists of 126,337 records on 2745 taxa. By including species of the eastern part of Europe not covered by other databases, PADAPT can facilitate studying the flora and vegetation of the eastern part of the continent. Although data coverage is far from complete, PADAPT meets the longstanding need for a regional database of the Pannonian flora.


Background & Summary
The trait-based approach has been significantly advancing our ecological and evolutionary understanding in various fields of research including vegetation science 1,2 .To support trait-based analyses with suitable data, several international plant trait databases have been established in the last decades.Some of them compile data for a wide range of traits at the regional scale, e.g., BiolFlor for the German flora 3 , LEDA for Northwest Europe 4 , BROT for the flora of the Mediterranean 5 , or AusTraits for the Australian flora 6 .Other databases provide data for a specific group of traits at the global scale, e.g., SID, the Seed Information Database 7 , D 3 , the Dispersal and Diaspore Database 8 , and SylvanSeeds, a germination database of deciduous forests 9 .There are also databases covering not only the traits and attributes of the flora of a region but also its vegetation, such as PLADIAS, the Database of the Czech Flora and Vegetation 10 .TRY 11 , the most frequently used (meta-)database, incorporates several databases, thus providing a global coverage for numerous plant traits.
Although data on a broad range of traits can be relatively easily retrieved from these databases, the fact that they incorporate records from several different sources, often from regions with markedly different climatic conditions, sometimes also with varying measurement standards is a shortcoming for certain analyses.Climate and local abiotic conditions can cause substantial intraspecific trait variability 12 , which renders the application of such broad-scale databases problematic for regional-scale studies.Moreover, the existing European databases' geographical coverage is limited and mostly focused either on the flora of the western and north-western part of Europe, or on the southern, mostly Mediterranean parts of the continent 5,13 .
Our dataset aims to cover the flora of the Pannonian biogeographic region which is situated in the eastern part of Central Europe surrounded by the Carpathians, the Alps, and the Dinaric Mountains.For the purpose of the dataset, we considered the Pannonian biogeographic region in the broadest sense; thus, we considered all areas included in any one of the overlapping territories of the Pannonian vegetation region, the Pannonicum floristic region or the Pannonian biogeographic region recognised by the European Union (for an overview see Fekete et al. 14 ).The whole territory of Hungary is included in this region, along with some rather small, typically lowland areas of Slovakia, the Czech Republic, Austria, Slovenia, Croatia, Serbia, Romania, and Ukraine.Our main reason for choosing the described region was that the territory of Hungary does not match with any (bio) geographical region, and it is more reasonable and meaningful for the dataset to cover a biogeographical region rather than a political entity such as a country.
Due to the above-mentioned geographical focus of the existing databases, a great proportion of the Pannonian flora is not represented in them.These problems limit the applicability of the existing databases for not just studies of the Pannonian flora and vegetation, but for studies in the eastern and central part of the continent in general, and also for studies attempting large-scale comparisons across regions.
The above-mentioned issues have motivated the compilation of a dataset focusing on the Pannonian flora.Here, we introduce PADAPT 1.0, the Pannonian Dataset of Plant Traits, which relies on regional data sources and collates data on a wide range of traits and attributes of the Pannonian flora and makes it broadly available to the international scientific community.PADAPT 1.0 provides regionally collected data and covers a high number of species with continental, Balkanic, and Pontic distributions; thus, it will promote studies of the flora and vegetation of not only the Pannonian region, but also the whole eastern half of the continent.The new dataset highlights data gaps, facilitates their targeted filling and promotes the exploration of intraspecific trait variability.Data collection will continue in the future and the PADAPT team welcomes any researcher interested in contributing to PADAPT with new regional data.In the coming years we expect to release PADAPT 2.0 complemented with additional attributes and further species.

Methods
The checklist of taxa was based on the checklist of the Distribution atlas of vascular plants of Hungary 15 which is constantly being updated with the species newly discovered in the country and therefore it is the most up-to-date checklist of the flora of Hungary.We adopted the checklist of the distribution atlas as of 1st January 2020, and then we used this checklist to harmonise the data coming from different sources.The current version of the dataset, PADAPT 1.0 only includes species of the Pannonian region that occur within the territory of Hungary, but trait measurements were in part carried out on specimens collected outside of Hungary 16,17 .As several of the data sources do not distinguish between subspecies, subspecies are generally not distinguished in the dataset either, except in cases when just one subspecies occurs in Hungary (e.g., Astragalus vesicarius subsp.albidus).
Along with the initialisation of collecting existing data, the PADAPT team started a sampling campaign in the Pannonian region to allow for measurements of leaf traits and seed mass in order to expand the range of existing trait data.The PADAPT protocol for measuring seed mass and leaf traits was based on data standards for LEDA 4 and the protocols by Perez-Harguindeguy et al. 18 .These trait data have already been published 16,17 and resulted in new leaf trait records for 1156 species 16 and new thousand-seed mass (TSM) data for 281 species 17 that have been incorporated into PADAPT.
To incorporate in the dataset, we considered data published in books and peer reviewed articles.There are four exceptions: the IUCN Red List category of species, and Conservation status (in Hungary), Conservation value (HUF), and Year of protection (in Hungary) which are based on a ministerial decree currently in force in Hungary (see Table 1).To reduce the effect of intraspecific variability and different climatic conditions in other regions, we aimed to focus on trait data measured in the Pannonian region and did not consider publications which contain trait data for the species in the PADAPT checklist but from different regions.Some of the attributes are based on a single data source, while for some other attributes we collated data from several different published sources (see Tables 1, 2).The dataset includes multiple columns for traits for which we have data from multiple sources (thousand-seed mass, seed bank persistence, and all leaf traits).The published sources that were used to collect the data differ in their approach regarding the presentation of the measured values.Some of them only present the mean of multiple measurements (e.g., the mean of three leaves of each of ten individuals), while some of them contain more data points per species (e.g., presenting the means of leaves coming from different individuals separately).We decided to unify the approaches of the different sources by taking one mean value per data source per species.In most cases it results in one data point per species per locality.We applied a different approach only with chromosome numbers and ploidy levels because these data originate from 33 different published sources and most of these sources contain data for only a very limited set of the species.
In case of seed mass category, we were able to categorise further species into the system of Csontos 19 based on recent publications containing data from seed mass measurements carried out in Hungary 20,21 .Seed bank persistence index was calculated as the ratio of data indicating a persistent soil seed bank with a value from 0 to 1, where zero means that all available published data indicates a transient seed bank, and 1 means that all data indicates a persistent seed bank.In case of dispersal strategy, species that have not been evaluated by Sádlo et al. 22 were categorised into the same dispersal strategies according to the descriptions provided in Sádlo et al. 22 .Some of the species previously not categorised into Soó's and Borhidi's phytosociogical system 23,24 were also assigned a phytosociological category based on literature data (see Table 1).
A trait or attribute was included in PADAPT if appropriate data was available for it for a considerable proportion of the checklist and if it was considered meaningful for future ecological studies.After all these considerations, we have compiled the dataset based on 109 published sources (Tables 1, 2).

Data records
The current version, PADAPT 1.0 consists of 126,337 individual records on 2745 taxa based on 109 different published sources.There are 54 attributes included in PADAPT.A summary of all the included traits and attributes is presented in Tables 1, 2. The dataset is available from figshare 25 (https://doi.org/10.6084/m9.figshare.21937157.v2)and the dataset file 25 also includes a short explanation and data sources for every attribute.

Technical Validation
Data included in PADAPT are based on published sources (peer-reviewed journal articles and books).Experts of the respective field were involved in the data compilation for every attribute to obtain greater accuracy and reliability.After the compilation of the dataset, all attributes were checked for errors by looking for abnormal values and outliers.In some cases, the issues were resolved by checking the original publications again or asking the author(s).Presumably erroneous values were omitted from the dataset.Despite the measures taken to attain high reliability, the occurrence of errors in the dataset is still possible and we encourage users to report any errors to the authors.The PADAPT protocol for measuring seed mass and leaf traits was based on data standards for LEDA 4 and the protocols by Perez-Harguindeguy et al. 18 to ensure that the recently obtained trait data 16,17 included in PADAPT are reliable and comparable to other databases.
The current version (1.0) includes only the species that can be found in the territory of Hungary.Although the number of species in the Pannonian flora is not established in the literature, we estimate that approximately 90% of the flora of the region is already represented in the current version.

Usage Notes
Besides the dataset reposited in figshare 25 , data presented here is also available in an interactive form at the website www.padapt.eu.
All data included in PADAPT 1.0 are public, but the present paper should be appropriately referenced when using the data.

Table 1 .
Overview of the 54 attributes included in PADAPT 1.0: Attributes related to growth habit, strategy, reproduction, dispersal, karyology, distribution, and conservation (continued in Table 2).Notations: Categ.categoricalvariable; Cont.-continuous variable; Ord.-Ordinal variable.*A list of chromosome numbers of the species reported in the literature.

Table 2 .
Overview of the 54 attributes included in PADAPT 1.